── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(gganimate)
Final Project: CO2 Emissions by Country
Data Description
In this project, I examine the CO2 Emissions Estimates data from the UN database. This data set has 4 main components: the country, the year data is estimated for (1975, 1985, 2005, 2010, 2015, 2018, 2019, and 2020), the total emissions of that country in that year (in thousand metric tons of carbon dioxide, the emissions per capita (metric tons of carbon dioxide), and finally additional footnotes and a source. Though this data is valuable, it leaves room for extrapolation. The main questions I wanted answered were in regards to whether or not there’s a relationship between GDP per capita and CO2 Emissions, and which countries have had the highest increases and decreases in per capita emissions since the beginning of data estimation.
CO2 <-read_csv("CO2_Emissions.csv", skip =1) #reads in clean version of UN CO2 emissions
New names:
Rows: 2264 Columns: 7
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): ...2, Series, Footnotes, Source dbl (2): Region/Country/Area, Year num
(1): Value
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...2`
New names:
Rows: 6776 Columns: 7
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(7): T13, Gross domestic product and gross domestic product per capita, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
head(UNGDP, 10) #preview the data
# A tibble: 10 × 7
T13 Gross domestic product an…¹ ...3 ...4 ...5 ...6 ...7
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Region/Country/Area <NA> Year Seri… Value Foot… Sour…
2 1 Total, all countries or ar… 1995 GDP … 31,2… <NA> Unit…
3 1 Total, all countries or ar… 2005 GDP … 47,7… <NA> Unit…
4 1 Total, all countries or ar… 2010 GDP … 66,5… <NA> Unit…
5 1 Total, all countries or ar… 2015 GDP … 75,2… <NA> Unit…
6 1 Total, all countries or ar… 2019 GDP … 87,7… <NA> Unit…
7 1 Total, all countries or ar… 2020 GDP … 85,3… <NA> Unit…
8 1 Total, all countries or ar… 2021 GDP … 96,6… <NA> Unit…
9 1 Total, all countries or ar… 1995 GDP … 5,446 <NA> Unit…
10 1 Total, all countries or ar… 2005 GDP … 7,287 <NA> Unit…
# ℹ abbreviated name:
# ¹`Gross domestic product and gross domestic product per capita`
Though this doesn’t paint the complete picture of the data, it provides a rudimentary synopsis of the variables that the data holds. This data is dirty, and in order to determine the answer to these questions it needs to be cleaned.
Data Transformation
The CO2 data will be easier to understand if there’s separate columns for emissions per capita and total emissions by country. Next, we can move those columns to after the “Year” column. We can get rid of the country code as well, since the information in that data isn’t necessary for answering our questions. We must also rename the columns. Finally, the NA’s in the footnotes can be replaced with “None”.
CO2_wider <- CO2 |>pivot_wider(names_from =4, values_from =5) |>#makes new columns for emissions per capita and total emissions by countryrelocate(`Emissions (thousand metric tons of carbon dioxide)`, `Emissions per capita (metric tons of carbon dioxide)`, .after =`Year`) |>#moves emissions to directly after yearselect(2:7) #gets rid of country code, not necessary since code doesn't contain useful informationCO2_Named <-setNames(CO2_wider, c("Country", "Year", "Emissions (thousand metric tons of carbon dioxide)", "Emissions per capita (metric tons of carbon dioxide)", "Footnotes", "Source")) #names columns based off CO2_widerCO2Clean <- CO2_Named |>mutate(Footnotes=replace_na(Footnotes, "None")) #replaces NA values with "None" for no footnoteshead(CO2Clean)
This data is much easier to work with. Now, the GDP data. We can start by renaming the columns, leaving x’s for columns we want to remove, and removing them. Since the most recent common year between the GDP data and the CO2 data is 2020, we’ll zero in on that data. Finally, we only need GDP per capita, not Total GDP.
GDP_named2020 <-setNames(UNGDP, c("x", "Country", "Year", "Series", "Value", "x", "x")) |>#renames columnsselect(!contains("x")) |>#removes unwanted columnsfilter(str_detect(Year, "2020")) |>#filters to year 2020filter(str_detect(Series, "GDP per capita")) #focuses on GDP per capitahead(GDP_named2020)
# A tibble: 6 × 4
Country Year Series Value
<chr> <chr> <chr> <chr>
1 Total, all countries or areas 2020 GDP per capita (US dollars) 10,883
2 Africa 2020 GDP per capita (US dollars) 1,799
3 Northern Africa 2020 GDP per capita (US dollars) 3,039
4 Sub-Saharan Africa 2020 GDP per capita (US dollars) 1,518
5 Eastern Africa 2020 GDP per capita (US dollars) 951
6 Middle Africa 2020 GDP per capita (US dollars) 1,054
Now we can begin to transform the data, and create a data set that contains the pertinent data from the GDP data and the CO2 data. We can start by creating a CO2 data set that only contains the year 2020, as that is the most recent year between the CO2 data set and the GDP data set. We can then merge the data sets by the “Country” column. This also filters out regions, totals, and other data that we don’t need. Rename the columns of the new data set, leave x’s for data that you don’t need, remove the x’s, and relocate the necessary columns to where they are needed. The commas in the gdp of this data need to be removed and the values need to be transformed from characters into numbers. A new column is created for these numbers. Finally, certain countries are read in with special characters that R is unfamiliar with, which need to be replaced.
CO2Clean2020 <- CO2Clean |>filter(str_detect(Year, "2020")) #most recent common year between CO2 and GDPCO2GDP_clean <-left_join(CO2Clean2020, GDP_named2020, by ="Country") #joins CO2 and GDP by countryCO2GDP_clean <-setNames(CO2GDP_clean, c("Country", "Year", "Emissions (thousand metric tons of carbon dioxide)", "Emissions per capita (metric tons of carbon dioxide)", "x", "source", "x", "x", "GDP per capita (US dollars)")) |>#sets namesselect(!starts_with("x")) |>#removes x columnsrelocate("GDP per capita (US dollars)", .after ="Year") #relocates GDPCO2GDP_clean <- CO2GDP_clean |>mutate(GDP =as.numeric(str_replace(`GDP per capita (US dollars)`, ",", ""))) #turns GDP into numeric value without commasCO2GDP_clean$Country[29] ="Côte d'Ivoire"#replaces rows with characters R can readCO2GDP_clean$Country[32] ="Curaçao"#replaces rows with characters R can readCO2GDP_clean$Country[135] ="Türkiye"#replaces rows with characters R can readhead(CO2GDP_clean)
# A tibble: 6 × 7
Country Year `GDP per capita (US dollars)` Emissions (thousand metric ton…¹
<chr> <dbl> <chr> <dbl>
1 Albania 2020 5,278 3512
2 Algeria 2020 3,354 135599
3 Angola 2020 1,640 16939
4 Argentina 2020 8,561 150666
5 Armenia 2020 4,506 6464
6 Australia 2020 55,774 378417
# ℹ abbreviated name: ¹`Emissions (thousand metric tons of carbon dioxide)`
# ℹ 3 more variables:
# `Emissions per capita (metric tons of carbon dioxide)` <dbl>, source <chr>,
# GDP <dbl>
Next, in order to determine which countries have had the largest changes in CO2 emissions, a new column must be created to highlights those differences. To do this we can highlight the columns 2020 and 1975, and then create a new columns for each of those years. From there, we can create a column with the differences in those years. Finally, two data sets can be created, one that displays the 10 countries with the highest increase in emissions, and another with the 10 countries with the highest decrease in emissions. We can reorder these so that when graphed, they’ll be displayed in a more visually appealing manner
CO2_Wide_years <- CO2Clean |>filter(str_detect(Year, "2020|1975")) |>select(!contains("thousand")) #takes clean CO2 data with only the years 1975 and 2020CO2_Wide_years <-pivot_wider(CO2_Wide_years, names_from ="Year", values_from ="Emissions per capita (metric tons of carbon dioxide)") #pivots wider to create new columns for 1975 and 2020CO2_Wide_years <- CO2_Wide_years|>mutate(Difference = CO2_Wide_years$"2020"-CO2_Wide_years$"1975") #creates new column with difference between 2020 and 1975CO2shortH <- CO2_Wide_years |>group_by(Difference) |>arrange(desc(Difference)) |>#arrange difference by highest to lowesthead(10) |>#top 10 countries with the highest increase in emissions per capitamutate(Country =fct_reorder(Country, Difference)) #helps bar graph be in orderCO2shortT <- CO2_Wide_years |>group_by(Difference) |>drop_na(Difference) |>#drops NA valuesarrange((Difference)) |>#arrange difference from lowest to highesthead(10) #top 10 countries with the highest decrease in emissions per capitaCO2shortT$Country[1] ="Curaçao"#changes special characters in Curaçao to readable formatCO2shortT <- CO2shortT |>mutate(Country =fct_reorder(Country, Difference)) #helps bar graph be in order
Now, both data sets have the data in the necessary position to be visualized. To determine the relationship between GDP per capita and emissions per capita, and since there are 150 countries to plot, a scatter plot with a geom_smooth line will be most efficient, and help us see patterns in the data. A simple bar chart will help display the 10 countries with the highest increase and decrease in CO2 Emissions.
GDPgraph <-ggplot(CO2GDP_clean, aes(x = GDP, y =`Emissions per capita (metric tons of carbon dioxide)`, label = Country)) +geom_point() +#adds point graph geom_smooth() +#adds smooth linelabs(title ="GDP per capita and emissions per capita", x ="GDP per capita (US dollars)", y ="Emissions per Capita (metric tons of carbon dioxide)") +#creates labelstheme(axis.title.y =element_text(size=8)) #changes y axis title sizeIncrease <-ggplot(CO2shortH, aes(x = Country, y =as.numeric(Difference))) +geom_col() +#creates bar/column graphtheme(axis.text.x =element_text(angle =55, hjust =1)) +#creates bar graph with CO2ShortH to show highest countrieslabs(title ="Countries with highest increase in emissions per capita", x ="Country", y ="Change in Emissions per Capita (metric tons of carbon dioxide)") +#creates labelstheme(axis.title.y =element_text(size=5)) #changes y axis title sizeDecrease <-ggplot(CO2shortT, aes(x = Country, y =as.numeric(Difference))) +geom_col() +theme(axis.text.x =element_text(angle =55, hjust =1)) +# creates bar graph with CO2ShortT to show lowest countrieslabs(title ="Countries with highest decrease in emissions per capita", x ="Country", y ="Change in Emissions per Capita (metric tons of carbon dioxide)") +#creates labelstheme(axis.title.y =element_text(size=5)) #changes y axis title size
Analysis and Visualization
GDP per capita and CO2 Emissions
ggplotly(GDPgraph)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).
Warning: The following aesthetics were dropped during statistical transformation: label.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
This is the relationship between GDP per capita in US dollars and emissions per capita in metric tons of carbon dioxide. The general trend of emissions per capita as a function of GDP per capita is logarithmic. This means that generally, a country’s emissions per capita increases as their GDP per capita increases, until GDP per capita reaches about the $28000 mark, after which the curve begins to flatten out. This however is non-exhaustive, there are many countries both above and below that line. Note especially one outlier Qatar, with a GDP per capita of 52,316 and emissions per capita of 29.2 metric tons of CO2. There are other countries that display interesting data as well. Luxembourg, all the way on the right, has a GDP per capita of 117,724, and an emissions per capita of 11.8 metric tons of CO2. Australia and the United States have GDP per capitas that are slightly less and slightly more than half of Luxembourg’s GDP per capita, but both countries have higher CO2 Emissions than Luxembourg.
Highest Increase and Decrease in per capita CO2 Emissions
ggplotly(Decrease)
ggplotly(Increase)
These are the 10 countries with the highest decrease and increase in emissions per capita. Curaçao had the highest decrease, decreasing by 47.3 metric tons per person. Gibraltar had the highest increase, increasing by 16.9 metric tons per person. The median difference for all countries is 0.3. It was surprising to see the different countries on each list. The United States is commonly thought of as a country enveloped by consumerist and wasteful tendencies, yet has decreased their per capita emissions starkly. China, on the other hand, has had a comparatively much higher increase. Additionally, more of the countries with higher increase in emissions per capita are typically considered underdeveloped countries, while all of the countries with high decreases in emissions are typically considered highly developed countries. This is also unsurprising; countries with higher development indexes would likely have better technology for reducing emissions and better standards for emissions from manufacturing and industry.
Reflection
This data shows valuable insights about different countries and their CO2 emissions from 1975 until 2020. On its own, the data can reveal which countries have had the highest increase and decrease in emissions since 1975, among other data such as which countries currently have the highest and lowest emissions. When merged with data on GDP, the data can display CO2 emissions as a function of GDP, revealing an initial sharp increase in emissions as GDP increases, then tapering off logarithmically.
These revelations, though valuable, leave much to be desired. How does the authority of a government dictate how much CO2 a country is producing. How does technological development play a role in emissions? What do the expected consequence of different levels of per capita emissions look like? What different policies have the largest impact on CO2 emissions? In order to answer these questions, we would need more data on technological development, different levels of government interference in both emissions directly and also the market, climate modeling data, and advanced policy data. This data is, however, also largely obtainable through the UN data base so these answers can, in all likelihood, be extrapolated.
Bibliography
Hadley Wickham, Hadley, et al. “R for Data Science (2E).” R for Data Science (2e), r4ds.hadley.nz/. Accessed 16 Aug. 2024.
Long, James (JD), and Paul Teetor. “R Cookbook, 2nd Edition.” R Cookbook, 2nd Edition, 26 Sept. 2019, rc2e.com/.
R Core Team. “R: A Language and Environment for Statistical ## Computing.” The R Project for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria., 2021, www.R-project.org.
“Undata.” United Nations, United Nations, data.un.org/. Accessed 16 Aug. 2024.